Morphological Decomposition for Asr in German
نویسندگان
چکیده
In this contribution we report on our ongoing work in lexical decomposition for automatic speech recognition (ASR). Lexical decomposition is investigated with a twofold goal: lexical coverage optimization and improved automatic letter-tosound conversion. Whereas morphological decomposition is a widely-studied domain in linguistics, our interest is limited here to identifying and processing the statistically most relevant sources of lexical variation in text corpora. Lexical variation is shown to be particularly important for nouns, due to compounding. A set of about 340 decomposition rules has been developed using statistics from 300M words from different newspaper sources (primarily 14 years from the TAZ, the Berliner TAgesZeitung). The out-of-vocabulary (OOV) rate on the same 300M words is reduced from 5.2 to 4.6% in case-sensitive form and to 4.2% in case-insensitive form. For letter-to-sound conversion cross-morpheme letter sequences are a major source of ambiguity. Decomposition, by reducing these ambiguities, contributes to producing more consistent phonemic transcriptions for pronunciation dictionaries.
منابع مشابه
The Effect of Raising Morphological Decomposition Awareness on Lexical Knowledge of Complex English Words
Lexical knowledge of complex English words is an important part of language skills and crucial for fluent language use. This study aimed to assess the role of morphological decomposition awareness as a vocabulary learning strategy on learners’ productive and receptive recall and recognition of complex English words. University students majoring English at the...
متن کاملA Fault Diagnosis Method for Automaton based on Morphological Component Analysis and Ensemble Empirical Mode Decomposition
In the fault diagnosis of automaton, the vibration signal presents non-stationary and non-periodic, which make it difficult to extract the fault features. To solve this problem, an automaton fault diagnosis method based on morphological component analysis (MCA) and ensemble empirical mode decomposition (EEMD) was proposed. Based on the advantages of the morphological component analysis method i...
متن کاملEdinburgh SLT and MT System Description for the IWSLT 2013 Evaluation
This paper gives a description of the University of Edinburgh’s (UEDIN) systems for IWSLT 2013. We participated in all the MT tracks and the German-to-English and Englishto-French SLT tracks. Our SLT submissions experimented with including ASR uncertainty into the decoding process via confusion networks, and looked at different ways of punctuating ASR output. Our MT submissions are mainly based...
متن کاملAutomated closed captioning for Russian live broadcasting
The paper describes a hardware-software system for real-time closed captioning of Russian live TV broadcasts. The use of respeaking technology enabled us to create an ASR system with WER not exceeding 5.5%. Editing closed captions in real time further reduces WER down to 0.2%. In the paper we report some advancements in LMs for a highly inflected language and also in using morphological rescori...
متن کاملAssesment of intra-cultivar variation in Pistachio cvs ‘Akbari. and ‘Ahmadaghai’using morphological markers
This article has no abstract.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001